BUG: Fix pyarrow and numpy logical bug concerning bool and string #60529
+87
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.Using logical operators (e.g., |, &) on non-boolean data, where this data should be cast to bool, works for most types (e.g., float, strings). However, these operations fail with pyarrow-backed strings and numpy-backed strings.
This PR fixes the issues with pyarrow-backed string arrays by casting them into boolean arrays when they are used with logical operators. The newly implemented helper functions
convert_string_to_boolean_array
andcast_for_logical
perform the casting, while theARROW_LOGICAL_FUNCS
dictionary has been modified to use these helper functions in the process of performing logical operations (see pandas/core/arrays/arrow/array.py).This PR fixes the issues with numpy-backed string arrays by casting them into boolean arrays whenever they are used with boolean arrays in logical operations. This is done in the
logical_op
function (see pandas/core/ops/array_ops.py).